1,229 research outputs found

    SDRS: a new lossless dimensionality reduction for text corpora

    Get PDF
    In recent years, most content-based spam filters have been implemented using Machine Learning (ML) approaches by means of token-based representations of textual contents. After introducing multiple performance enhancements, the impact has been virtually irrelevant. Recent studies have introduced synset-based content representations as a reliable way to improve classification, as well as different forms to take advantage of semantic information to address problems, such as dimensionality reduction. These preliminary solutions present some limitations and enforce simplifications that must be gradually redefined in order to obtain significant improvements in spam content filtering. This study addresses the problem of feature reduction by introducing a new semantic-based proposal (SDRS) that avoids losing knowledge (lossless). Synset-features can be semantically grouped by taking advantage of taxonomic relations (mainly hypernyms) provided by BabelNet ontological dictionary (e.g. “Viagra” and “Cialis” can be summarized into the single features “anti-impotence drug”, “drug” or “chemical substance” depending on the generalization of 1, 2 or 3 levels). In order to decide how many levels should be used to generalize each synset of a dataset, our proposal takes advantage of Multi-Objective Evolutionary Algorithms (MOEA) and particularly, of the Non-dominated Sorting Genetic Algorithm (NSGA-II). We have compared the performance achieved by a Naïve Bayes classifier, using both token-based and synset-based dataset representations, with and without executing dimensional reductions. As a result, our lossless semantic reduction strategy was able to find optimal semantic-based feature grouping strategies for the input texts, leading to a better performance of Naïve Bayes classifiers.info:eu-repo/semantics/acceptedVersio

    Multi-objective evolutionary optimization for dimensionality reduction of texts represented by synsets

    Get PDF
    Despite new developments in machine learning classification techniques, improving the accuracy of spam filtering is a difficult task due to linguistic phenomena that limit its effectiveness. In particular, we highlight polysemy, synonymy, the usage of hypernyms/hyponyms, and the presence of irrelevant/confusing words. These problems should be solved at the pre-processing stage to avoid using inconsistent information in the building of classification models. Previous studies have suggested that the use of synset-based representation strategies could be successfully used to solve synonymy and polysemy problems. Complementarily, it is possible to take advantage of hyponymy/hypernymy-based to implement dimensionality reduction strategies. These strategies could unify textual terms to model the intentions of the document without losing any information (e.g., bringing together the synsets “viagra”, “ciallis”, “levitra” and other representing similar drugs by using “virility drug” which is a hyponym for all of them). These feature reduction schemes are known as lossless strategies as the information is not removed but only generalised. However, in some types of text classification problems (such as spam filtering) it may not be worthwhile to keep all the information and let dimensionality reduction algorithms discard information that may be irrelevant or confusing. In this work, we are introducing the feature reduction as a multi-objective optimisation problem to be solved using a Multi-Objective Evolutionary Algorithm (MOEA). Our algorithm allows, with minor modifications, to implement lossless (using only semantic-based synset grouping), low-loss (discarding irrelevant information and using semantic-based synset grouping) or lossy (discarding only irrelevant information) strategies. The contribution of this study is two-fold: (i) to introduce different dimensionality reduction methods (lossless, low-loss and lossy) as an optimization problem that can be solved using MOEA and (ii) to provide an experimental comparison of lossless and low-loss schemes for text representation. The results obtained support the usefulness of the low-loss method to improve the efficiency of classifiers.info:eu-repo/semantics/publishedVersio

    Mitochondrial echoes of first settlement and genetic continuity in El Salvador

    Get PDF
    Background: From Paleo-Indian times to recent historical episodes, the Mesoamerican isthmus played an important role in the distribution and patterns of variability all around the double American continent. However, the amount of genetic information currently available on Central American continental populations is very scarce. In order to shed light on the role of Mesoamerica in the peopling of the New World, the present study focuses on the analysis of the mtDNA variation in a population sample from El Salvador. Methodology/Principal Findings: We have carried out DNA sequencing of the entire control region of the mitochondrial DNA (mtDNA) genome in 90 individuals from El Salvador. We have also compiled more than 3,985 control region profiles from the public domain and the literature in order to carry out inter-population comparisons. The results reveal a predominant Native American component in this region: by far, the most prevalent mtDNA haplogroup in this country (at ~90%) is A2, in contrast with other North, Meso- and South American populations. Haplogroup A2 shows a star-like phylogeny and is very diverse with a substantial proportion of mtDNAs (45%; sequence range 16090–16365) still unobserved in other American populations. Two different Bayesian approaches used to estimate admixture proportions in El Salvador shows that the majority of the mtDNAs observed come from North America. A preliminary founder analysis indicates that the settlement of El Salvador occurred about 13,400±5,200 Y.B.P.. The founder age of A2 in El Salvador is close to the overall age of A2 in America, which suggests that the colonization of this region occurred within a few thousand years of the initial expansion into the Americas. Conclusions/Significance: As a whole, the results are compatible with the hypothesis that today's A2 variability in El Salvador represents to a large extent the indigenous component of the region. Concordant with this hypothesis is also the observation of a very limited contribution from European and African women (~5%). This implies that the Atlantic slave trade had a very small demographic impact in El Salvador in contrast to its transformation of the gene pool in neighbouring populations from the Caribbean facade

    Measurement of χ c1 and χ c2 production with s√ = 7 TeV pp collisions at ATLAS

    Get PDF
    The prompt and non-prompt production cross-sections for the χ c1 and χ c2 charmonium states are measured in pp collisions at s√ = 7 TeV with the ATLAS detector at the LHC using 4.5 fb−1 of integrated luminosity. The χ c states are reconstructed through the radiative decay χ c → J/ψγ (with J/ψ → μ + μ −) where photons are reconstructed from γ → e + e − conversions. The production rate of the χ c2 state relative to the χ c1 state is measured for prompt and non-prompt χ c as a function of J/ψ transverse momentum. The prompt χ c cross-sections are combined with existing measurements of prompt J/ψ production to derive the fraction of prompt J/ψ produced in feed-down from χ c decays. The fractions of χ c1 and χ c2 produced in b-hadron decays are also measured

    Measurement of the production of a W boson in association with a charm quark in pp collisions at √s = 7 TeV with the ATLAS detector

    Get PDF
    The production of a W boson in association with a single charm quark is studied using 4.6 fb−1 of pp collision data at s√ = 7 TeV collected with the ATLAS detector at the Large Hadron Collider. In events in which a W boson decays to an electron or muon, the charm quark is tagged either by its semileptonic decay to a muon or by the presence of a charmed meson. The integrated and differential cross sections as a function of the pseudorapidity of the lepton from the W-boson decay are measured. Results are compared to the predictions of next-to-leading-order QCD calculations obtained from various parton distribution function parameterisations. The ratio of the strange-to-down sea-quark distributions is determined to be 0.96+0.26−0.30 at Q 2 = 1.9 GeV2, which supports the hypothesis of an SU(3)-symmetric composition of the light-quark sea. Additionally, the cross-section ratio σ(W + +c¯¯)/σ(W − + c) is compared to the predictions obtained using parton distribution function parameterisations with different assumptions about the s−s¯¯¯ quark asymmetry

    Measurements of fiducial and differential cross sections for Higgs boson production in the diphoton decay channel at s√=8 TeV with ATLAS

    Get PDF
    Measurements of fiducial and differential cross sections are presented for Higgs boson production in proton-proton collisions at a centre-of-mass energy of s√=8 TeV. The analysis is performed in the H → γγ decay channel using 20.3 fb−1 of data recorded by the ATLAS experiment at the CERN Large Hadron Collider. The signal is extracted using a fit to the diphoton invariant mass spectrum assuming that the width of the resonance is much smaller than the experimental resolution. The signal yields are corrected for the effects of detector inefficiency and resolution. The pp → H → γγ fiducial cross section is measured to be 43.2 ±9.4(stat.) − 2.9 + 3.2 (syst.) ±1.2(lumi)fb for a Higgs boson of mass 125.4GeV decaying to two isolated photons that have transverse momentum greater than 35% and 25% of the diphoton invariant mass and each with absolute pseudorapidity less than 2.37. Four additional fiducial cross sections and two cross-section limits are presented in phase space regions that test the theoretical modelling of different Higgs boson production mechanisms, or are sensitive to physics beyond the Standard Model. Differential cross sections are also presented, as a function of variables related to the diphoton kinematics and the jet activity produced in the Higgs boson events. The observed spectra are statistically limited but broadly in line with the theoretical expectations

    Search for squarks and gluinos with the ATLAS detector in final states with jets and missing transverse momentum using √s=8 TeV proton-proton collision data

    Get PDF
    A search for squarks and gluinos in final states containing high-p T jets, missing transverse momentum and no electrons or muons is presented. The data were recorded in 2012 by the ATLAS experiment in s√=8 TeV proton-proton collisions at the Large Hadron Collider, with a total integrated luminosity of 20.3 fb−1. Results are interpreted in a variety of simplified and specific supersymmetry-breaking models assuming that R-parity is conserved and that the lightest neutralino is the lightest supersymmetric particle. An exclusion limit at the 95% confidence level on the mass of the gluino is set at 1330 GeV for a simplified model incorporating only a gluino and the lightest neutralino. For a simplified model involving the strong production of first- and second-generation squarks, squark masses below 850 GeV (440 GeV) are excluded for a massless lightest neutralino, assuming mass degenerate (single light-flavour) squarks. In mSUGRA/CMSSM models with tan β = 30, A 0 = −2m 0 and μ > 0, squarks and gluinos of equal mass are excluded for masses below 1700 GeV. Additional limits are set for non-universal Higgs mass models with gaugino mediation and for simplified models involving the pair production of gluinos, each decaying to a top squark and a top quark, with the top squark decaying to a charm quark and a neutralino. These limits extend the region of supersymmetric parameter space excluded by previous searches with the ATLAS detector

    Measurement of the top pair production cross section in 8 TeV proton-proton collisions using kinematic information in the lepton plus jets final state with ATLAS

    Get PDF
    A measurement is presented of the ttˉt\bar{t} inclusive production cross-section in pppp collisions at a center-of-mass energy of s=8\sqrt{s}=8 TeV using data collected by the ATLAS detector at the CERN Large Hadron Collider. The measurement was performed in the lepton+jets final state using a data set corresponding to an integrated luminosity of 20.3 fb1^{-1}. The cross-section was obtained using a likelihood discriminant fit and bb-jet identification was used to improve the signal-to-background ratio. The inclusive ttˉt\bar{t} production cross-section was measured to be 260±1(stat.)23+22(syst.)±8(lumi.)±4(beam)260\pm 1{\textrm{(stat.)}} ^{+22}_{-23} {\textrm{(syst.)}}\pm 8{\textrm{(lumi.)}}\pm 4{\mathrm{(beam)}} pb assuming a top-quark mass of 172.5 GeV, in good agreement with the theoretical prediction of 25315+13253^{+13}_{-15} pb. The ttˉ(e,μ)+jetst\bar{t}\to (e,\mu)+{\mathrm{jets}} production cross-section in the fiducial region determined by the detector acceptance is also reported.Comment: Published version, 19 pages plus author list (35 pages total), 3 figures, 2 tables, all figures including auxiliary figures are available at http://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PAPERS/TOPQ-2013-06

    Search for squarks and gluinos in events with isolated leptons, jets and missing transverse momentum at s√=8 TeV with the ATLAS detector

    Get PDF
    The results of a search for supersymmetry in final states containing at least one isolated lepton (electron or muon), jets and large missing transverse momentum with the ATLAS detector at the Large Hadron Collider are reported. The search is based on proton-proton collision data at a centre-of-mass energy s√=8 TeV collected in 2012, corresponding to an integrated luminosity of 20 fb−1. No significant excess above the Standard Model expectation is observed. Limits are set on supersymmetric particle masses for various supersymmetric models. Depending on the model, the search excludes gluino masses up to 1.32 TeV and squark masses up to 840 GeV. Limits are also set on the parameters of a minimal universal extra dimension model, excluding a compactification radius of 1/R c = 950 GeV for a cut-off scale times radius (ΛR c) of approximately 30

    Observation of top-quark pair production in association with a photon and measurement of the ttγ production cross section in pp collisions at √s = 7 TeV using the ATLAS detector

    Get PDF
    A search is performed for top-quark pairs (tt) produced together with a photon (γ) with transverse energy greater than 20 GeV using a sample of tt candidate events in final states with jets, missing transverse momentum, and one isolated electron or muon. The data set used corresponds to an integrated luminosity of 4.59 fb −1 of proton-proton collisions at a center-of-mass energy of 7 TeV recorded by the ATLAS detector at the CERN Large Hadron Collider. In total, 140 and 222 ttγ candidate events are observed in the electron and muon channels, to be compared to the expectation of 79 +/- 26 and 120 +/- 39 non-ttγ background events, respectively. The production of ttγ events is observed with a significance of 5.3 standard deviations away from the null hypothesis. The ttγ production cross section times the branching ratio (BR) of the single-lepton decay channel is measured in a fiducial kinematic region within the ATLAS acceptance. The measured value is σ (fid/tty) × BR = 63 +/- 8(stat) (+17/-13)(syst) +/- 1 lumi fb per lepton flavor, in good agreement with the leading-order theoretical calculation normalized to the next-to-leading-order theoretical prediction of 48 +/- 10 fb
    corecore